-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: avoid using ' ' in availability zone #1380
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@seanschneeweiss @lingxiankong @ramineni need some comments ... thanks |
Build succeeded.
|
Build succeeded.
|
Build succeeded.
|
Build succeeded.
|
Build succeeded.
|
Build succeeded.
|
Build succeeded.
|
Build succeeded.
|
Build succeeded.
|
Build succeeded.
|
Build succeeded.
|
@jichenjc pls check my comment here , #1364 (comment) Could we raise issue in k/k repo , to allow spaces? Is this issue coming only in 1.20 |
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L166 is the code reported the error I highly suspect it won't be accepted by k/k but anyway I opened kubernetes/kubernetes#98202 |
@jichenjc then this should be just be doc update , spaces not allowed in AZ names. replacing space with '-' is a problem , as what if AZ actually do exist with - , it might cause confusion. And also, lets just keep the k/k issue open and take inputs from others as well. |
I can create a doc PR on this , but if we can find the issue early |
As I commented in #1381, only described in doc that telling the k8s cluster admin that it's not possible to set the label because the az name in openstack cloud contains space. Normally, the cluster admin doesn't have access to the cloud, there is no option for the k8s cluster admins in this case. Additionally, changing az name in the cloud could involve lots of work that should be avoidable. Being that said, for the label |
@lingxiankong We can't change the AZ name in the label , replace ' ' or any special character with '-', that would fail scheduling when topology enabled in any CSI plugins . And we dont know , if there is already AZ named with '-' eg: topology.kubernetes.io/zone = "test zone" we replace it with "test-zone" , what if "test-zone" exists or we dont know if it can be created in future in openstack. That will lead to confusion and scheduling on node in wrong zone. IMO, this is used for scheduling , we should not change the value arbitrarily. |
I think several options maybe considered
|
Yes, you are right. That's why I mentioned if we go this way we should let the cluster admin know that 'test zone' and 'test-zone' will be treated as the same zone name (as @jichenjc said this is a very rare edge case), otherwise the cluster admin need to notify the cloud admin to change the zone name as required. Personally, I think this is an acceptable option compared to failing to initialize the cluster node. What do you think or what's your suggestion? |
I am +1 to this option with some document to describe the limtiations |
if folks agree, I will abandon #1381 |
@lingxiankong If its only the label that is affected , then may be we could tweak it as suggested.. As I mentioned already , scheduling fails based on the value. For instance, once we change , node AZ doesnt match with volume AZ and the scheduling fails that no valid node. So we need to tweak volume AZ also accordingly. There might be other places / plugins tweaking is necessary as the value by other sidecars to do validation. Its better to fail at node initialization , rather than failing it later stage due to some issue. Its a limitation from Kubernetes side , so I think document it clearly and let the admin know beforehand is better IMHO |
@ramineni Could you please remind me how we specify volume AZ when creating PV/PVC and where should the user get the volume AZ? Does that support space? If yes, you are right, a PV/PVC of az "test az" won't fit into node az "test-az". My understanding of the steps is (could be wrong), the user:
Because OCCM already converts "test az" to "test-az", so the user should never use "test az", there is no scheculing issue from what I can see (unless I've missed something). |
@lingxiankong yes, that can be one scenario and that requires volume to be created in correct AZ and parameters inside volume should be changed as well. The other one can be, it can be specified as storage class parameter , |
When Kubernetes user is creating StorageClass, where should she look for the correct AZ names? Isn't coming from Node label? |
I suppose its from cinder availability-zone-list |
IMHO, the k8s cluster admin should always get zones from the Node label, we shouldn't have the assumption that the k8s cluster admin is able to access openstack cloud. That's the reason I said before that replacing " " with "-" in OCCM is safe because the Node label can never contain " " as that's not allowed. |
Node zones can be different from volume zones ..thats the reason i mentioned it should be retrieved from cinder |
Well, yes, for some historical reasons, one of the things that complicates use of availability zones is that each OpenStack project implements them in their own way (if at all). However, although it's possible, I've never heard of such use cases that Nova and Cinder have configured different availability zones, because that would cause lots of issues, e.g when the nova-compute service creates the volumes to attach to the server during its creation, it will request that those volumes are created in the same availability zone as the server, which must exist in the block storage (cinder) service. But I'm happy to know more if you have (or heard of) any particular reason that doing so in real deployments. Another point is, I noticed you are using The reason I insist on getting this worked is to make sure OCCM can always be working for the cluster admins without too much hassle to change the underlying openstack cloud which the cluster admins usually don't necessarily have access to, otherwise, it's highly unlikely the OCCM could be deployed. As you said, If there are use cases that the storageclass already specify space-based az (i.e. Changing " " to "-" in node label |
@lingxiankong we do have some cases, thats the reason we support ignore-volume-az option in such cases. But I'm afraid thats not my point. All I'm saying is changing space to '-' is not that straight forward as you mentioned as we did in instance-type, it requires lot of tweaking at various parts of code, so we have to carefully consider all scenarios before changing that. How about we keep this on hold as of now and stick to doc limitation for now, and if we have many such issues reported , and is very common, we could again relook at the permanent solution. As this is the first issue reported out of all releases (as limitation is there from the start, its not newly added one), I'm assuming its not common . wdyt? |
Agree. /hold |
@jichenjc: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
closing this . As it has been decided we don't solve this issue for now but add a note in the documentation. /close |
@ramineni: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
topology.kubernetes.io/zone
will be used as zone label to nodes but it does not allow blank ' 'however, blank is allowed in nova az settings thus this will lead to issue
this PR tries to solve that problem by replacing ' ' with '-'
but not sure this is the right way to go.. so get some opinion through the code change
Which issue this PR fixes(if applicable):
fixes #1379
Special notes for reviewers:
Release note: